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Determining the genetic architecture of complex traits is challenging because phenotypic variation arises from interactions 
between multiple, environmentally sensitive alleles. We quantified genome-wide transcript abundance and phenotypes for six 
ecologically relevant traits in D. melanogaster wild-derived inbred lines. We observed 10,096 genetically variable transcripts 
and high heritabilities for all organismal phenotypes. The transcriptome is highly genetically intercorrelated, forming 241 
transcriptional modules. Modules are enriched for transcripts in common pathways, gene ontology categories, tissue-specific 
expression and transcription factor binding sites. The high degree of transcriptional connectivity allows us to infer genetic 
networks and the function of predicted genes from annotations of other genes in the network. Regressions of organismal 
phenotypes on transcript abundance implicate several hundred candidate genes that form modules of biologically meaningful 
correlated transcripts affecting each phenotype. Overlapping transcripts in modules associated with different traits provide insight 
into the molecular basis of pleiotropy between complex traits. 



< Natural populations harbor a wide range of phenotypic ^ 
82 for all aspects of morphology, physiology, behaviors and disease 
■g susceptibility. Knowledge of the genetic basis of this variation is 
z important for understanding adaptive evolution, deriving elite 
o domestic crop and animal strains and improving human health. 
* However, determining the genetic architecture of natural pheno- 
® typic variation is challenging because most phenotypic variation is 
attributable to segregating alleles at many interacting genes with 
jffa environmentally sensitive effects 1 - 2 . 

Recent genome-wide association studies in human populations 
have reported that diseases and quantitative traits are associated 
with loci with small effects, which together account for a few percent 
of the phenotypic variance for that trait 3 . These findings suggest that 
the bulk of genetic variation for complex traits may be due to alleles 
with small and possibly context-dependent effects. Further, the 
reported associated variants are often found in genes with no a priori 
expected relationship to the trait, in computationally predicted genes, 
or in gene deserts. In Drosophila, quantitative analyses of new 
mutations have revealed large numbers of loci affecting quantitative 
traits 4 , as have high-resolution maps of segregating quantitative trait 
loci (QTLs) in Drosophila 4 and mice 5 . Single alleles often have 
pleiotropic effects on multiple traits, epistatic interactions among 
loci affecting the same trait are common, and allelic effects can be 
conditional on sex and the £ 



Our understanding of the genetic architecture of quantitative traits 
in model systems, and ultimately humans, will benefit from inter- 
rogating a single resource population for variation in DNA sequence, 
transcript abundance, proteins and metabolites; for multiple organis- 
mal phenotypes; and in multiple environments. This 'systems genetics' 
approach will yield a detailed map of genetic variants associated with 
each organismal phenotype in each environment, provide a functional 
context for interpreting the phenotypes, elucidate the genetic under- 
pinnings that govern the interdependence of multiple phenotypes, and 
address the long-standing question of the genetic basis of genotype by 



Here we report the first step of a systems-genetics analysis of the 
genetic basis of complex traits in Drosophila. We demonstrate ubiqui- 
tous variation in transcript abundance among inbred Drosophila 
strains recently derived from the wild, and show that the variable 
transcripts can be grouped into coherent modules of intercorrelated 
genes. These lines harbor substantial genetic variation for ecologi- 
cally relevant complex traits, and variation for hundreds of transcripts 
is associated with variation for each of the organismal traits. 
Transcripts associated with each trait form correlated transcriptional 
modules from which we can construct networks of interacting 
genes with plausible biological relationships to each other and 
the traits. These genetic networks provide additional func- 
tional annotation of the Drosophila genome, and an integrated 
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82 context in which to frame predictions of the behavior of the 

*g system following genetic or environmental perturbations. 

Z 

g RESULTS 

8 Natural variation in transcript abundance 

© We derived 40 highly inbred lines from the natural population of 
Raleigh, North Carolina, USA. These lines are a living library of 
A3ik common polymorphisms affecting complex traits. We assessed whole- 
WS* genome variation in transcript abundance for young males and 
females of each of these lines using Affymetrix Drosophila 2.0 arrays. 
We standardized the raw array data by median centering and used a 
statistical approach to identify outlier probes in each perfect-match 
probe set that contained single feature polymorphisms (SFPs) between 
the wild-derived lines and the reference strain sequence used to design 
the array. We used the median log 2 signal intensity of the remaining 
perfect-match probes in each probe set as the measure of expression. 
Of the 18,800 transcripts on the array, 14,840 (78.9%) were expressed 
in young adults (Supplementary Table 1 online). Many genes that 
have been characterized for their role in early development are also 
expressed in adults, and may affect adult phenotypes that cannot be 
predicted from their prior developmental functions 9 ' 10 . We used 
analysis of variance (ANOVA) to partition variation in expression 
between sexes, among lines, and the sex x line interaction for each 
expressed transcript. 

The sex term was significant at a conservative false discovery rate 
(FDR) 11 of 0.001 for 13, 086 (88.2%) of the expressed transcripts 
(Fig. la and Supplementary Table 1), indicating pervasive sexual 
dimorphism for gene expression. A total of 3,255 transcripts had 
twofold or greater differences in expression between the sexes: 1,690 



Figure 1 Variation in transcript abundance among 
40 wild-derived inbred lines, (a) Sex bias for gene 
expression. Blue and red dots represent genes 
showing a twofold difference in gene expression 
between males and females, respectively, 
(b) Distribution of broad-sense heritabilities (H 2 ). 
Dark green denotes significant H 2 estimates (line 
FDR < 0.001) and grey indicates nonsignificant 
H 2 estimates, (c) Distribution of cross-sex genetic 
correlations for transcripts showing significant 
variation in sexual dimorphism (significant sex x 
line interaction variance at FDR < 0.001). 
(d) Bivanate plot of H 2 estimates in males and 
females. Orange dots indicate significant line- 
by-sex interaction variance, (e) Chromosomal 
distribution of sex-biased gene expression. The 
dark blue and red bars are observed male and 
female counts, respectively, and the light blue 
and red bars are the expected numbers of male 
and female transcripts, respectively. Asterisks 
indicate significant deviation of observed from 
expected values (P < 0.001). 



with female-biased expression, and 1,565 with 
male-biased expression. Previous studies 
reported largely male-biased expression in 
D. melanogaster 12 ' 13 . Our observation of 
nearly equal numbers of transcripts with 
strong male and female expression bias is 
likely attributable to our greater power to 
detect smaller sex-biased effects, as the abso- 
lute magnitude of the difference in expression 
between the sexes is less for female-biased 
than for male-biased transcripts (Fig. la). Gene ontology analysis 14 
showed that both male- and female-biased transcripts were enriched 
for genes affecting reproduction and gametogenesis. The female- 
biased transcripts were also enriched for genes affecting basic cellular 
processes, and the male-biased transcripts for genes involved in 
reproductive behavior and physiology, mitochondrial energy metabo- 
lism and intermediary metabolism (Supplementary Table 2 online). 
The line term was significant at an FDR < 0.001 for 10,096 (68.0%) 



of the expressed transcripts, indicating considerable genetic v; 
in gene expression (Supplementary Table 1). Estimates of broad-sense 
heritability (H 2 ) for significant transcripts ranged from « 0.3-1.0 
(Fig lb). Transcripts with high levels of genetic v 
0.8) were enriched for genes involved with responses t( 
ment, whereas transcripts with low levels of genetic v. 
0.2) were enriched for genes affecting processes essential for survival 
(Supplementary Table 2). The overall correlation (r) between H 2 and 
mean expression was low, although statistically significant (r = 0.078, 
P = 3.13 x 10~ 21 ). The high level of genetic variation in gene 
expression is partly attributable to the doubling of the additive genetic 
variation of an outbred population in a population of fully inbred 
lines, and inflation of broad-sense heritability estimates by any 
nonadditive genetic variance 1 . Significant heritability of abundance 
of a particular transcript does not necessarily mean that as-acting 
genetic polymorphisms cause the variation; it could be due to trans 
regulation by another genetically variable transcript 6-8 . 

A significant sex x line interaction indicates genetic variation in the 
magnitude of the sex dimorphism among the lines, or equivalently, a 
significant departure of the cross-sex genetic correlation (r MF ) from 
unity. The sex x line interaction was significant for 4,108 (40.7%) of 
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the expressed transcripts at an FDR < 0.001 (Supplementary 
Table 1). The average cross-sex genetic correlation of the transcripts 
showing genetic variation in sexual dimorphism was quite low (r MF = 
0.234), with a mode at r MF = 0 (Fig. lc). Variation in expression of 
many transcripts among the lines was uncorrelated between males and 
females, arguing for caution when extrapolating inferences about 
variation in gene expression from expression profiles drawn from 
one sex to the other sex. These data also reveal great potential for rapid 
evolution of sex-biased gene expression, as observed when different 
Drosophila species are compared 13 . Low cross-sex genetic correlations 



Figure 2 Correlated transcriptional modules, 
(a) Distribution of connectivity (average \r\) for 
the 10,096 genetically variable transcripts 
(line FDR < 0.001). (b) Clustering of the 
genetically variable transcripts into 241 modules, 
(c) Relationship between transcript H 2 and ave- 
rage connectivity. Error bars, s.e.m. (d) Correlated 
transcriptional modules for genes in the amino 
sugars metabolism and Notch pathway KEGG 
ontologies. The colors on the off-diagonal 
represent the average cross-module correlations. 



were only partly attributable to transcripts 
that were expressed and variable in only one 
sex. Many of the transcripts showing varia- 
tion in sex dimorphism in expression were 
actually expressed in both sexes, but had 
much higher heritabilities in one sex com- 
pared to the other (Fig. Id). Transcripts with 
low cross-sex genetic correlations (r MF < 0.2) 
were enriched for the same gene ontology 
categories as sex-biased genes, indicating that 
sex-biased transcripts are also genetically vari- 
able in the sex in which they are highly 
expressed (Supplementary Table 2). 

The patterning of genetic variation within a population depends on 
effective population size, recombination rate and the selection coeffi- 
cient of new mutations 1 . These factors vary among chromosomes: 
X chromosomes have smaller effective population sizes than auto- 
somes and are hemizygous in males, and recombination is severely 
reduced on the Drosophila fourth chromosome. Therefore, we asked 
whether there were differences among chromosomes in mean level 
and genetic variance of transcript abundance. Consistent with pre- 
vious studies 12 ' 15 , we found that male-biased transcripts were strongly 




Figure 3 Biology of transcriptional modules, (a) Distribution of tissue- 
specific expression in modules 7, 18, 23, 66, 91. Module 7 is enriched for 
male-biased transcripts and expression in the testes and accessory glands. 
Module 18 is enriched for female-biased transcripts and expression in 
ovaries. Module 23 is enriched for transcripts affecting reproduction and 
gametogenesis that are highly expressed in ovaries and male accessory 
glands. Module 66 is enriched for transcripts in the Notch signaling 
pathway and nervous system development expressed in the midgut. Module 
91 is enriched for transcripts affecting the function of the nervous system with X 
enriched for the Abd-b (P= 0.004) and Adf-1 (P= 0.001) transcription factor 
and Adf-1 in memory and synaptogenesis 48 , consistent with the inferred functior 
mphasizing the genetic correlations between adult transcripts for t 
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(d) Putative functional annotation of CG15065 
their correlation to CG15065 shows that IM1 is 
alignments of CG15065, IM1 and IM2 are highly conserved. 
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Figure 4 Variation for organismal phenotypes among 40 wild-derived inbred lines, (a-f) Distributions of line means among 40 wild-derived inbred lines. The 
red and blue bars in panels a-d depict females and males, respectively. Sexes were not measured separately in panels e-f. Error bars, s.e.m. (a) Starvation 
stress resistance (hP = 0.56). (b) Chill coma recovery (H 2 = 0.23). (c) Life span (H 2 = 0.54). (d) Locomotor reactivity (H 2 = 0.58). (e) Copulation latency 
(H 2 = 0.25). (f) Competitive fitness (H 2 = 0.32). 



~ under-represented on the X chromosome (and overrepresented on 
g chromosome 21). In contrast, female-biased transcripts were strongly 
< overrepresented on the X chromosome (Fig. le). Possibly the X 
£ chromosome is a hospitable location for female- but not male-biased 
*j genes because mutations with X-linked deleterious effects on male 
Z fitness are quickly purged from populations, but mutations with 
© recessive X-linked deleterious effects on female fitness can achieve 
§ higher frequencies because they are protected from natural selection 
© when they are rare. We found differences in overall transcript 
abundance among the chromosomes for both sexes (P < 0.0001), 
A3ik with chromosome 4 having the highest mean expression and the X 
WS* chromosome the lowest mean expression (data not shown). We 
also found differences in H 2 of transcript abundance between the 
chromosomes (P < 0.0001), largely attributable to reduced genetic 
l the X and fourth chromosomes (data not shown). 



Modules of correlated transcripts 

We assessed the extent to which the 10,096 variable transcripts were 
genetically correlated among the lines (Fig. 2). We computed pairwise 
genetic correlations among all the variable transcripts, and 
computed the mean absolute value of the pairwise genetic correlations 
of each transcript with all other transcripts as a measure of average 
connectivity (|r|, Supplementary Table 1). The distribution of |r| was 
strongly skewed towards high values, with a mode at 0.6 (Fig. 2a). 
Thus, the genome as a whole is highly genetically correlated at the 
transcriptional level, which imposes constraints on the evolution of 
transcriptional genetic networks. There is a strong inverse correlation 
(r = -0.263, P = 1.08 x 10 -159 ) between transcript heritability and 
average connectivity — the most highly heritable transcripts have low 
mean values of |r| (Fig. 2c) and are therefore presumably less 
evolutionarily constrained. Indeed, there is a significant positive 
correlation between heritability of transcript abundance and co, the 



ratio of nonsynonymous to synonymous substitutions, among single- 
copy genes with orthologs in six melanogaster group species 16 (r = 
0.132, P = 7.56 x 10~ 25 ). Genes encoding transcripts with lower 
heritabilities experience stronger purifying selection than do genes 
encoding transcripts with high heritabilities. Although high heritabil- 
ities are expected for genes under mutation- drift equilibrium 1 , it is not 
likely that this mechanism accounts for the observed high heritabilities 
of transcript abundance, as the estimates of co for these loci are much 
less than the neutral expectation of unity. In addition, the high 
heritability transcripts are predominantly for genes affecting responses 
to the environment, which have been associated with responses to 
artificial selection for multiple traits 17-20 . Therefore, the high herit- 
abilities for these transcripts could be the result of more complex 
evolutionary dynamics. 

We grouped the genetically variable transcripts into modules using 
a novel method to identify separable clusters of highly interconnected 
genes (E.A.S. and J.F.A. unpublished data). The 10,096 transcripts fell 
into 241 modules (Fig. 2b and Supplementary Table 1). The two 
largest modules (7 and 18) consisted of 1,765 and 4,128 transcripts, 
with average absolute intramodule correlations of 0.89 and 0.77, 
respectively. Moreover, the expression of genes in these two modules 
was strongly negatively correlated among the lines (Fig. 2b). We 
carried out gene ontology enrichment analyses 14 and found that 
module 7 was enriched for the same functional categories as male- 
biased transcripts, and module 18 was enriched for the same func- 
tional categories as female-biased transcripts. We found significant 
overrepresentations of male-biased genes in module 7 (1,241 of 1,565 
male-biased genes, %\ = 4,910; P x 0) and of female-biased genes in 
module 18 (1,381 of 1,690 female-biased genes, yj = M00; P « 0). 
Thus, the negative correlation between modules 7 and 18 is attribu- 
table to higher expression of genes in module 7 in males than females, 
and higher expression of genes in module 18 in females than males. 
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We asked whether there 
these traits, as would occur 



showed a significant 
tendency for lines that 
competitive fitness and 



Effect size (a) 

Figure 5 Distribution of SFP effects. The x axis is the SFP allele effect, 
a/us, where a is one half the difference in trait mean between the SFP 
alleles and <jg is the genetic standard deviations of each trait. The y axis is 
the minor allele frequency. The traits are color-coded: chill coma recovery 
(dark blue), starvation resistance (red), fitness (green), lifespan (purple), 
:ivity (turquoise) and copulation latency (orange). 



^ There are strong correlations among transcripts in the same func- 
£ tional pathways (for example, amino sugars metabolism and the Notch 
~ signaling pathway, Fig. 2d). Genes within a module often show similar 
= tissue-specific expression patterns 21 , which suggests a common biolo- 
gical function, genetic variation in organ size or both (Fig. 3a). Many 
c modules are enriched for known transcription factor binding sites 
(Fig. 3b and Supplementary Table 3 online,). The high degree of 
~ connectivity among transcript variation in a natural population allows 
g us to infer potential interacting partners of focal genes in networks for 
< functional studies. For example, disco, disco-r and tsh interact during 
£ embryonic and larval development 22 , and have genetically intercorre- 
*g lated transcripts in adults in module 161 (Supplementary Table 1 and 
z Fig. 3c). These genes are expressed in the adult nervous system 21 , and 
© are correlated with three other genes in this module (unc-5, drl, argos ) 
°i that are involved in nervous system development and axon guidance 23 
© (Fig. 3c). Transcriptional correlation also enables functional annota- 
tion of computationally predicted genes based on known annotations 
A3ik of other genes in the network. CG15065 is a putative immune-induced 
WS* molecule (IM) gene, judging from its strong transcriptional correla- 
tions with IM1 (r = 0.74), IM2 (r = 0.63) and IM3 (r = 0.67), physical 
location adjacent to these genes, and notable protein sequence simi- 
larity to IM1 and IM2 (Fig. 3d). 

Associations with organismal phenotypes 

We quantified variation among the 40 Raleigh inbred lines for several 
ecologically relevant traits (resistance to starvation stress, time to 
recover from a chill-induced coma, life span, a startle-induced loco- 
motor response and mating speed) as well as a measure of competitive 
fitness. We found substantial genetic variation for all traits, with 
estimates of H 2 between 0.25-0.58 (Fig. 4a-f and Supplementary 
Table 4 online). The range of variation among these lines is compar- 
able to the difference in mean phenotype between lines subjected to 
long-term divergent artificial selection for these traits 17 ' 19 . We observed 
significant sex x line interactions for starvation stress resistance, life 
span and chill coma recovery time (Supplementary Table 4). Esti- 
mates of r MF (± s.e.m.) were high for organismal phenotypes (0.72 ± 
0.11, 0.73 ± 0.11 and 0.87 ± 0.08 for starvation stress resistance, life 
span and chill coma recovery, respectively), in contrast to the low 
cross-sex genetic correlations observed at the level of transcripts. 



were significant genetic correlations among 
if segregating alleles have pleiotropic effects 
direction 1,2 . Only five genetic correlations 
(Supplementary Table 4). There is a 
from chill coma quickly to have high 
ipidly, but at the expense of surviving 
starvation stress. Lines that are resistant to starvation stress tend to 
have longer life spans, but reduced competitive fitness. Thus, there is a 
trade-off between genetic variants affecting recovery from different 
environmental stresses. We do not observe high positive correlations 
between all traits with each other and with fitness, as would be the case 
if variation among the lines was attributable to the fixation of 
deleterious alleles. 

We observed 3,316 probes containing SFPs, and assessed associa- 
tions of SFPs with organismal phenotypes. We found 119, 118, 141, 
217, 245 and 195 SFPs associated with starvation stress resistance, 
chill-coma recovery time, life span, locomotor behavior, mating speed 
and fitness, respectively (P < 0.01, Supplementary Table 5 online). 
Although some SFPs were associated with more than one trait 
(Supplementary Table 5), the number of SFPs associated with 
multiple traits did not exceed that expected by chance. Because we 
the frequency of SFPs with significant phenotype 
s well as the homozygous effect of the SFPs, we can 
evaluate the distribution of allelic effects. The homozygous effects 
follow an exponential distribution for all traits, with larger effects 
associated with the rarer SFPs, and smaller effects with the common 
SFPs (Fig. 5), as previously predicted 24 . 

We used regression models to identify transcripts that were 
associated with each organismal phenotype. At a P value of 0.01, we 
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Figure 6 Effects of P-element mutations in candidate genes affecting 
quantitative traits. Mutational effects are given as deviations from the 
co-isogenic control line. Red and blue bars represent males and females, 
respectively. Mutations in all genes shown have significant effects in one 
or both sexes (Supplementary Table 7). Error bars, s.e.m. (a) Chill coma 
recovery time, (b) Starvation stress resistance, (c) Locomotor reactivity (data 
from ref. 27). 
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found 355, 1,128, 295, 231, 691 and 414 transcripts associated candidate genes for chill coma recovery time affected this trait 

with starvation stress resistance, chill-coma recovery time, life span, (Fig. 6a,b and Supplementary Table 7 online). Six of nine mutations 

locomotor behavior, mating speed and fitness, respectively (Supple- in candidate genes affecting locomotor reactivity have been shown 

mentary Table 6 online). There was little overlap between associations previously to affect this trait 26 (Fig. 6c). 
of variation in transcripts and SFPs for the same phenotypes, further 

increasing the number of candidate genes potentially associated with Transcriptional networks associated with complex traits 

each trait. Most transcripts associated with phenotypes were either unexpected 

Transcripts that are significantly associated with organismal phe- based on prior mutational analyses of the traits, or from computa- 

notypes are candidate genes affecting the phenotype 25 . We compared tionally predicted genes. To gain insight about functional relation- 

phenotypes of P-element insertional mutations in or near candidate ships among transcripts associated with each trait, we used the 

genes with that of their co-isogenic control lines 9 ' 10 . Seven of ten residuals of the significant regressions of organismal phenotypes on 

mutations near candidate genes for resistance to starvation stress gene expression to quantify modules of transcripts with coordi- 

indeed affected starvation resistance, and 29 of 39 mutations near nated patterns of expression across the 40 lines (Fig. 7 and 




Figure 7 Modules of correlated transcripts associated with organismal phenotypes. (a-e) Competitive fitness, (a) Clustering of the 414 transcripts 
significantly associated with variation in fitness into 20 modules, (b) Tissue-specific expression of transcripts in modules 7 and 9 (ovaries), module 8 
(accessory glands and testes) and module 17 (head, brain and thoracicoabdominal ganglion), (c) Interaction network for module 7. Each node represents a 
gene and each edge the correlation between a pair of genes. Module 7 is enriched for female-biased transcripts and transcripts affecting DNA replication. 

(d) Interaction network for module 9. Module 9 is enriched for female-biased transcripts and transcripts affecting oogenesis and transcriptional regulation. 

(e) Interaction network for module 8. Module 8 is dominated by male-biased genes, and is enriched for genes involved in male-induced postmating 
behaviors, including three genes encoding accessory gland proteins (Acps). (f-g) Starvation stress resistance, (f) Clustering of the 355 transcripts 
significantly associated with variation in starvation resistance into 11 modules, (g) Interaction network for module 6. The black arrows indicate SFP variants 
in a probe set that are associated with variation in expression of the other probes in that probe set (c/'s-acting variants) and with variation in another 
transcript (frans-acting variants). The orange nodes indicate genes with a WD40 protein domain. 
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0) Figure 8 Pleiotropy between phenotypic modules. Grey lines connect 

£ modules with a significant overlap of greater than four genes between gene 

<o lists, as determined by Fisher's exact tests. 



~ Supplementary Table 6). Transcripts with spurious association to a 
= phenotype are unlikely to correlate with biologically relevant tran- 
scripts after removal of the source of the association; conversely, 
c transcripts under coordinated control are likely to show correlated 
w - expression patterns even after removal of the effect of their common 
~ relationship to a phenotype. Each of the correlated transcript 
g modules associated with a trait can be represented as an interaction 
< network, with edges between transcripts in the network determined 
£ by genetic correlations in transcript abundance exceeding a thresh- 
*j old value (Fig. 7). We identified 26 modules of correlated transcripts 
z associated with chill coma recovery time, 20 associated with fitness, 
© 1 1 with starvation stress resistance, 10 with life span and 9 each with 
°i locomotor reactivity and copulation latency (Fig. 7 and Supple- 
© mentary Table 6). 

We evaluated the biological significance of these networks by 
asking whether genes within each module were enriched for 
gene ontology categories (Supplementary Table 8 online), expression 
in particular tissues, known protein-protein interactions or shared 
domains. As expected, transcripts associated with variation for 
fitness are enriched for genes that mediate immune response (modules 
6 and 11), visual perception and function of the nervous 
system (module 17) and chemosensation (module 20), and for sex- 
specific transcripts (modules 7, 8 and 9) (Fig. 7 and Supplementary 
Table 8). Variation for fitness can be maintained if there are 
negative genetic correlations between fitness components 1 . Transcripts 
in modules 7 and 9, which have female-biased expression, are 
positively genetically correlated with each other but negatively 
genetically correlated with transcripts in module 8, which have 
male-biased expression. The genes of module 8 encode proteins that 
are transferred to females on mating, are thought to benefit male 
fitness 27 ' 28 , and that evolve rapidly 29 ' 30 , but which impose a fitness cost 
on females 31 . The molecular basis of the female response to this 
sexual conflict is not known, and could plausibly lie in the module 7 
and 9 transcripts. 

Transcripts associated with variation in starvation resistance are 
enriched for genes that mediate antimicrobial response (module 4), 
transcription (module 6) and proteolysis (module 9) (Supplementary 
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Table 8). One of the most highly connected genes in module 6, raptor, 
is a member of the TOR (Target Of Rapamycin) pathway, which has a 
key role in nutrient-sensitive signaling, regulates cell growth and 
cellular mass 32 and the use of alternative energy resources under 
starvation conditions 33 . 

As gene ontology enrichment analysis 14 revealed similarities between 
modules of correlated transcripts for the six traits (Supplementary 
Table 8), we tested whether there was more overlap of common genes 
between modules for the different traits than expected by chance, and 
uncovered substantial modular pleiotropy (Fig. 8). For example, genes 
affecting the mitochondrial ribosome are in common between chill 
coma recovery module 17 and copulation latency module 3 (P = 4.74 
x 10~ 4 ), chill coma recovery module 17 and starvation resistance 
module 8 (P = 1.17 x 10~ 3 ), and starvation resistance module 8 and 
copulation latency module 5 (P = 7.53 x 10~ 4 ); whereas genes 
affecting defense response to bacteria are in common between starva- 
tion resistance module 4 and fitness module 1 1 (4.57 x 10 -9 ) (Fig. 8 
and Supplementary Table 6). These results give insights to the 
molecular basis of pleiotropy between complex traits. 

DISCUSSION 

Our quantitative genetic analysis of whole-genome variation in 
transcript abundance among a wild-derived population of Drosophila 
inbred lines has revealed surprising features of the genetic architecture 
of transcription. Nearly 80% of the genome is expressed in adult flies, 
and approximately 90% of the expressed transcripts have sex-biased 
expression at a stringent false discovery rate. Two-thirds of the 
expressed transcripts are genetically variable in this sample of 
lines — a much greater extent of genetic variation than indicated in 
previous studies 34 ^ 3 . Over 40% of the genetically variable transcripts 
also show genetic variation in sex dimorphism. Notably, the whole 
transcriptome is highly genetically intercorrelated, with 60% of the 
variable transcripts belonging to two large modules with high positive 
genetic correlations within modules, and high negative correlations 
between modules. One of the large modules is enriched for male- 
biased transcripts and the other for female-biased transcripts. The 
genetically correlated transcriptional modules are biologically plausi- 
ble, with enrichments for transcripts in common pathways, gene 
ontology categories, tissue-specific expression and transcription factor 
binding sites. The high transcriptional connectivity at the level of 
genetic correlation of natural variation in gene expression allows us to 
infer genetic networks from the transcriptional networks, and the 
function of computationally predicted genes based on known annota- 
tions of other genes in the network. 

Several hundred transcripts and SFPs are associated with phenotypic 
variation in each quantitative trait, and 70% of P-element insertional 
mutations tested in these candidate genes indeed significantly affect the 
traits. The transcripts associated with each trait group into biologically 
plausible modules of correlated transcripts, which are in turn correlated 
between traits, providing insight into the molecular basis of genetic 
correlations. Variation in transcript abundance in young adults reared 
under standard culture conditions predicts candidate genes and 
modules of correlated transcripts associated with variation in stress 
responses, behaviors and life span. The lines and transcript data are 
therefore a valuable resource for the Drosophila community, 
enabling similar analyses for any complex phenotype that can be 
quantified. Future integration of whole-genome DNA sequence 
variation with variation in transcript abundance and phenotypes will 
allow us to disentangle causal from consequential associations, and 
determine the frequency of causal alleles. Further, the lines can be 
crossed to interrogate heterozygous effects and degrees of dominance 
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of alleles affecting transcriptional and organismal variation 44 . Knowl- 
edge of allele frequencies and homozygous and heterozygous 
effects will yield unprecedented insight into the nature of evolutionary 
forces maintaining segregating variation for complex traits in 
natural populations. 

METHODS 

Drosophila lines. We derived inbred lines from the Raleigh, USA population 
by 20 generations of full-sib mating. We used the C(2L)RM-P1, b 1 ; C(2R)RM- 
SKIA, cn 1 bw l compound autosome (CA) stock for fitness assays. P-element 
mutations and co-isogenic control lines were a gift of H. Bellen (Howard 
Hughes Medical Institute, Baylor College of Medicine). We reared flies on 

cornmeal-molassc s , r medium - - •"« rd live humidity and a 12- 

h light-dark cycle unless otherwise specified. 

Gene expression. We used Aflymetrix Drosophila 2.0 arrays to assess transcript 
profiles of 3- to 5-d-old flies from the inbred lines. All samples were frozen 
between 1 and 3 pm. We extracted RNA from two independent pools (25 flies/ 
sex/line), and hybridized 10 ug fragmented cRNA to each array. We rando- 
mized RNA extraction, labeling and array hybridization across all 
samples, and normalized the raw array data across sexes and lines using a 
median standardization. 

Each transcript is represented by 14 perfect-match 25-bp oligonucleotides. To 
identify perfect-match probes with SFPs between the wild-derived lines and the 
strain used to design the array, we quantified the maximal degree to which the 
variation between lines could be reduced by partitioning the lines into two allelic 
classes. We computed [he sum of squared dc\ unions from each class mean and 
expressed their sum as a fraction of the total sum of squares. The smallest 
fraction across all bipartitions was used to score each probe. We identified 3,136 
candidate SFPs with scores <0.1 (a tenfold reduction in the sum of squares). 
We validated polymorphisms in 20 of 21 of these SFPs by designing primers 
flanking the SFP and sequencing the PCR products (data not shown). 

Our measure of expression for each probe set was the median log 2 signal 
intensity of perfect-match probes without SFPs. We used negative control 
probes to estimate the background intensity, and removed probes below 
this threshold. 



Organismal phenotypes. For the starvation stress resistance group, we placed 
ten same-sex, 2-d-old flies in vials containing 1.5% agar and 5 ml water, and 
scored survival every eight hours (N = 5 vials/sex/line). For the chill coma 
recovery group, we placed 3- to 7-d-old flies in empty vials on ice for three 
hours, and recorded the time for each individual to right itself after transfer to 
room temperature (N = 20 flies/sex/line). For longevity, we placed five 1- to 
| 2-d-old same-sex virgin flies into vials containing 5 ml medium, and recorded 
" survival every two days (N = 5 vials/sex/line). For locomotor reactivity, we 
placed single 3- to 7-d-old flies into vials containing 5 ml medium. The 
following day, between 8 am and 12 pm, we mechanically disturbed each fly 19 , 
and recorded the total activity in the 45 s immediately following the 
disturbance. We obtained two replicate measurements of 20 flies/sex/repli- 
eale/line. for [he copulation latency group, we aspirated pairs of 3- to 7-d-old 
virgin flies into vials containing 5 ml medium between 8 am and 12 pm, and 
recorded the number of minutes until initiation of copulation, for a maximum 
of 120 min (N = 24 pairs/line). For the reproductive fitness group, we used the 
competitive index technique 45,46 . We reared all wild-type and CA parents in 
constant density (10 pairs) vials. We placed six 3- to 4-d-old virgin CA males 
and females and three 3- to 4-d-old wild-type males and females in a vial 
containing 10 ml medium, discarding the flies after 7 d. The competitive index 
was the ratio of the number of wild type to the total number of progeny 
emerging by day 17 (N = 20 replicate vials/line). 

Quantitative genetic analyses. We used ANOVA to partition phenotypic 
variation between sexes (S, fixed), fines (I, random), the S x L interaction 
(random) and the error variance (e). We also carried out reduced ANOVAs by 
sex. We estimated broad-sense heritabilities (H 2 ) as H 2 = (a 2 L + <tsl)I{<Tl + "si 
+ dj), where <j\, a\ L , and a\ are the among-line, sex x line and within-line 
variance components, respectively. For the analyses by sex, H 2 = rr\j{a\ + <ri). 
We estimated cross-trait (cross-sex) genetic correlations as = cov^afjj, 



where cov^ is the covariance of line means between traits i and (males and 
females), and <7; and Oj are the square roots of the among line variance 
components for the two traits (males and females). 

Transcriptional modules. To identify modules of genetically correlated tran- 
scripts, we computed the correlation r» between all pairs of significant 
transcripts i and j. The absolute correlationsr |r;j| were transformed to define 

edge weights e ^ in a graph of genes indexed by the free parameter a. We 
determined the clustering P = {V h V^} and value of a that jointly 
the modularity function 



Mv, v) 



where A 0 (X,F) denotes the total edge weight in the graph indexed by a 
that connects any vertex in set X to a vertex in set Y. The optimal partition 
P = {V\, VjJ defines k transcriptional modules Vi, ..... V% 

Transcript-phenotype associations. We used regression models ( Y = « + S + T 
+ Sx T + e, where S denotes sex and T the trait covariate) to identify transcripts 
significantly (P < 0.01) associated with organismal phenotypic variation in 
both sexes. We used the residuals from regression models (Y = \i + E + S + 
SxE + e, where E is the covariate median logj expression level l to compute 
genetic correlations between transcripts significantly associated with each 
phenotype for module construction. 

Pleiotropic modules. To identify transcriptional modules common to more 
than one phenotype. we considered pans of phenolvpes, comparing the lists of 
significant transcripts for each module from the first phenolvpe to each module 
from the second. We used Fisher's exact test to quantify the extent that the 
overlap between the two modules exceeded the chance expectation. 

Transcription factor motifs. We scored 5 UTR sequences of each D. melano- 
gaster transcript against position weight matrices for 5(> transcription factors; 
100 permutations of each sequence were used to generate an empirical 
distribution of scores for each motif. 'Present' motifs had scores within the 
top 5% of the permutation distribution. We determined the genome-wide 
proportion of genes for which each motif was present, and compared the 
proportion of genes within a gene list or module for which a motif was present 
to the genome-wide proportion using a one-sided binomial test. 
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